1 Introduction

Comprehensive quality control (QC) of single-cell RNA-seq data was performed with the singleCellTK package. This report contains information about each QC tool and visualization of the QC metrics for each sample. For more information on running this pipeline and performing quality control, see the documentation. If you use the singleCellTK package for quality control, please include a reference in your publication.

2 Summary Statistics

2.1 SCTK-QC

d8b737fb 8628f96c e7372715 84824920 c47a5959 All Samples
Number of Cells 4114 2920 5464 3683 3804 19985
Mean counts 6792.4 2291 3494 3253.4 2280 3721.8
Median counts 6925.5 1520.5 2705.5 2272 1639 2581
Mean features detected 2533.1 1232.8 1651.8 1554.9 1219.8 1671.9
Median features detected 2670.5 1005 1526 1358 1038 1467
scDblFinder - Number of doublets 198 123 413 214 360 1308
scDblFinder - Percentage of doublets 4.81 4.21 7.56 5.81 9.46 6.54
DoubletFinder - Number of doublets, Resolution 1.5 309 219 410 276 285 1499
DoubletFinder - Percentage of doublets, Resolution 1.5 7.51 7.5 7.5 7.49 7.49 7.5
CXDS - Number of doublets 361 399 744 560 407 2471
CXDS - Percentage of doublets 8.77 13.7 13.6 15.2 10.7 12.4
BCDS - Number of doublets 158 252 371 438 240 1459
BCDS - Percentage of doublets 3.84 8.63 6.79 11.9 6.31 7.3
SCDS Hybrid - Number of doublets 195 374 443 523 380 1915
SCDS Hybrid - Percentage of doublets 4.74 12.8 8.11 14.2 9.99 9.58
DecontX - Mean contamination 0.0649 0.0626 0.104 0.115 0.125 0.0961
DecontX - Median contamination 0.0335 0.0314 0.0697 0.0677 0.08 0.0542

The summary statistics table summarizes QC metrics of the cell matrix. This table summarizes the mean and median of UMI counts and median of genes detected per cell, as well as the number and percentages of doublets and estimated ambient RNA scores per dataset.

3 General quality control metrics

SingleCellTK utilizes the scater package to compute cell-level QC metrics. The wrapper function runPerCellQC can be used to separately compute QC metrics on its own. The wrapper function plotRunPerCellQCResults can be used to plot the general QC outputs. The QC outputs are sum, detected, and percent_top_X. sum contains the total number of counts for each cell. detected contains the total number of features for each cell. percent_top_X contains the percentage of the total counts that is made up by the expression of the top X genes for each cell. The subsets_ columns contain information for the specific gene list that was used. For instance, if a gene list containing mitochondrial genes named mito was used, subsets_mito_sum would contains the total number of mitochondrial counts for each cell.

3.1 Total Counts

3.2 Total Features

3.3 Percentage of Library Size Occupied by Top 50 Expressed Features

3.4 Parameters

useAssay counts
collectionName NULL
geneSetList NULL
geneSetListLocation rownames
mitoRef NULL
mitoIDType NULL
mitoPrefix NULL
mitoID NULL
mitoGeneLocation NULL
percent_top 50 100 200 500
use_altexps FALSE
flatten TRUE
detectionLimit 0
packageVersion 1.22.1

In this function, the inSCE parameter is the input SingleCellExperiment object, while the useAssay parameter is the assay object that in the SingleCellExperiment object the user wishes to use.

4 Doublet Detection

4.1 Doublet Detection Summary

4.1.1 scDblFinder

4.1.2 Scds_Cxds

4.1.3 Scds_Bcds

4.1.4 Scds_Hybrid

4.1.5 doubletFinder_1.5

4.2 DoubletFinder

DoubletFinder is a doublet detection algorithm which depends on the single cell analysis package Seurat. The wrapper function runDoubletFinder can be used to separately run the DoubletFinder algorithm on its own. The wrapper function plotDoubletFinderResults can be used to plot the QC outputs from the DoubletFinder algorithm. The DoubletFinder outputs are doubletFinder_doublet_score, which is a numeric variable of the likelihood that a cell is a doublet, and the doubletFinder_doublet_label, which is the assignment of whether the cell is a doublet.

4.2.1 d8b737fb

4.2.1.1 Resolution: 1.5

4.2.1.1.1 DoubletFinder Doublet Assignment

4.2.1.1.2 DoubletFinder Doublet Score

4.2.1.1.3 Density of Doublet Score

4.2.1.1.4 Violin of Doublet Score

4.2.1.1.5 Parameters
useAssay counts
seed 12345
seuratNfeatures 2000
seuratPcs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
seuratRes 1.5
formationRate 0.075
nCores NULL
verbose FALSE
packageVersion 2.0.2

runDoubletFinder relies on a parameter (in Seurat) called resolution to determine cells that may be doublets. Users will be able to manipulate the resolution parameter through seuratRes. If multiple numeric vectors are stored in seuratRes, there will be multiple label/scores. The seuratNfeatures parameter determines the number of features that is used in the FindVariableFeatures function in Seurat. seuratPcs parameter determines the number of dimensions used in the FindNeighbors function in Seurat. The formationRate parameter is the estimated doublet detection rate in the dataset. aims to detect doublets by creating simulated doublets from combining transcriptomic profiles of existing cells in the dataset.

4.2.2 8628f96c

4.2.2.1 Resolution: 1.5

4.2.2.1.1 DoubletFinder Doublet Assignment

4.2.2.1.2 DoubletFinder Doublet Score

4.2.2.1.3 Density of Doublet Score

4.2.2.1.4 Violin of Doublet Score

4.2.2.1.5 Parameters
useAssay counts
seed 12345
seuratNfeatures 2000
seuratPcs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
seuratRes 1.5
formationRate 0.075
nCores NULL
verbose FALSE
packageVersion 2.0.2

runDoubletFinder relies on a parameter (in Seurat) called resolution to determine cells that may be doublets. Users will be able to manipulate the resolution parameter through seuratRes. If multiple numeric vectors are stored in seuratRes, there will be multiple label/scores. The seuratNfeatures parameter determines the number of features that is used in the FindVariableFeatures function in Seurat. seuratPcs parameter determines the number of dimensions used in the FindNeighbors function in Seurat. The formationRate parameter is the estimated doublet detection rate in the dataset. aims to detect doublets by creating simulated doublets from combining transcriptomic profiles of existing cells in the dataset.

4.2.3 e7372715

4.2.3.1 Resolution: 1.5

4.2.3.1.1 DoubletFinder Doublet Assignment

4.2.3.1.2 DoubletFinder Doublet Score

4.2.3.1.3 Density of Doublet Score

4.2.3.1.4 Violin of Doublet Score

4.2.3.1.5 Parameters
useAssay counts
seed 12345
seuratNfeatures 2000
seuratPcs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
seuratRes 1.5
formationRate 0.075
nCores NULL
verbose FALSE
packageVersion 2.0.2

runDoubletFinder relies on a parameter (in Seurat) called resolution to determine cells that may be doublets. Users will be able to manipulate the resolution parameter through seuratRes. If multiple numeric vectors are stored in seuratRes, there will be multiple label/scores. The seuratNfeatures parameter determines the number of features that is used in the FindVariableFeatures function in Seurat. seuratPcs parameter determines the number of dimensions used in the FindNeighbors function in Seurat. The formationRate parameter is the estimated doublet detection rate in the dataset. aims to detect doublets by creating simulated doublets from combining transcriptomic profiles of existing cells in the dataset.

4.2.4 84824920

4.2.4.1 Resolution: 1.5

4.2.4.1.1 DoubletFinder Doublet Assignment

4.2.4.1.2 DoubletFinder Doublet Score

4.2.4.1.3 Density of Doublet Score

4.2.4.1.4 Violin of Doublet Score

4.2.4.1.5 Parameters
useAssay counts
seed 12345
seuratNfeatures 2000
seuratPcs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
seuratRes 1.5
formationRate 0.075
nCores NULL
verbose FALSE
packageVersion 2.0.2

runDoubletFinder relies on a parameter (in Seurat) called resolution to determine cells that may be doublets. Users will be able to manipulate the resolution parameter through seuratRes. If multiple numeric vectors are stored in seuratRes, there will be multiple label/scores. The seuratNfeatures parameter determines the number of features that is used in the FindVariableFeatures function in Seurat. seuratPcs parameter determines the number of dimensions used in the FindNeighbors function in Seurat. The formationRate parameter is the estimated doublet detection rate in the dataset. aims to detect doublets by creating simulated doublets from combining transcriptomic profiles of existing cells in the dataset.

4.2.5 c47a5959

4.2.5.1 Resolution: 1.5

4.2.5.1.1 DoubletFinder Doublet Assignment

4.2.5.1.2 DoubletFinder Doublet Score

4.2.5.1.3 Density of Doublet Score

4.2.5.1.4 Violin of Doublet Score

4.2.5.1.5 Parameters
useAssay counts
seed 12345
seuratNfeatures 2000
seuratPcs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
seuratRes 1.5
formationRate 0.075
nCores NULL
verbose FALSE
packageVersion 2.0.2

runDoubletFinder relies on a parameter (in Seurat) called resolution to determine cells that may be doublets. Users will be able to manipulate the resolution parameter through seuratRes. If multiple numeric vectors are stored in seuratRes, there will be multiple label/scores. The seuratNfeatures parameter determines the number of features that is used in the FindVariableFeatures function in Seurat. seuratPcs parameter determines the number of dimensions used in the FindNeighbors function in Seurat. The formationRate parameter is the estimated doublet detection rate in the dataset. aims to detect doublets by creating simulated doublets from combining transcriptomic profiles of existing cells in the dataset.

4.3 ScDblFinder

scDblFinder is a doublet detection algorithm in the scran package. scDblFinder aims to detect doublets by creating a simulated doublet from existing cells and projecting it to the same PCA space as the cells. The wrapper function runScDblFinder can be used to separately run the scDblFinder algorithm on its own. The wrapper function plotScDblFinderResults can be used to plot the QC outputs from the scDblFinder algorithm. The output of scDblFinder is a scDblFinder_doublet_score and scDblFinder_doublet_call. The doublet score of a droplet will be higher if the it is deemed likely to be a doublet.

4.3.1 d8b737fb

4.3.1.1 ScDblFinder Doublet Assignment

4.3.1.2 ScDblFinder Doublet Score

4.3.1.3 Density Score

4.3.1.4 Violin Score

4.3.1.5 Parameters

useAssay counts
nNeighbors 50
simDoublets 19985
seed 12345
packageVersion 1.8.0

The nNeighbors parameter is the number of nearest neighbor used to calculate the density for doublet detection. simDoublets is used to determine the number of simulated doublets used for doublet detection.

4.3.2 8628f96c

4.3.2.1 ScDblFinder Doublet Assignment

4.3.2.2 ScDblFinder Doublet Score

4.3.2.3 Density Score

4.3.2.4 Violin Score

4.3.2.5 Parameters

useAssay counts
nNeighbors 50
simDoublets 19985
seed 12345
packageVersion 1.8.0

The nNeighbors parameter is the number of nearest neighbor used to calculate the density for doublet detection. simDoublets is used to determine the number of simulated doublets used for doublet detection.

4.3.3 e7372715

4.3.3.1 ScDblFinder Doublet Assignment

4.3.3.2 ScDblFinder Doublet Score

4.3.3.3 Density Score

4.3.3.4 Violin Score

4.3.3.5 Parameters

useAssay counts
nNeighbors 50
simDoublets 19985
seed 12345
packageVersion 1.8.0

The nNeighbors parameter is the number of nearest neighbor used to calculate the density for doublet detection. simDoublets is used to determine the number of simulated doublets used for doublet detection.

4.3.4 84824920

4.3.4.1 ScDblFinder Doublet Assignment

4.3.4.2 ScDblFinder Doublet Score

4.3.4.3 Density Score

4.3.4.4 Violin Score

4.3.4.5 Parameters

useAssay counts
nNeighbors 50
simDoublets 19985
seed 12345
packageVersion 1.8.0

The nNeighbors parameter is the number of nearest neighbor used to calculate the density for doublet detection. simDoublets is used to determine the number of simulated doublets used for doublet detection.

4.3.5 c47a5959

4.3.5.1 ScDblFinder Doublet Assignment

4.3.5.2 ScDblFinder Doublet Score

4.3.5.3 Density Score

4.3.5.4 Violin Score

4.3.5.5 Parameters

useAssay counts
nNeighbors 50
simDoublets 19985
seed 12345
packageVersion 1.8.0

The nNeighbors parameter is the number of nearest neighbor used to calculate the density for doublet detection. simDoublets is used to determine the number of simulated doublets used for doublet detection.

4.4 Cxds

CXDS, or co-expression based doublet scoring, is an algorithm in the SCDS package which employs a binomial model for the co-expression of pairs of genes to determine doublets. The wrapper function runCxds can be used to separately run the CXDS algorithm on its own. The wrapper function plotCxdsResults can be used to plot the QC outputs from the CXDS algorithm. The output of runCxds is the doublet score, scds_cxds_score.

4.4.1 d8b737fb

4.4.1.1 Cxds Doublet Assignment

4.4.1.2 Cxds Doublet Score

4.4.1.3 Density Score

4.4.1.4 Violin Score

4.4.1.5 Parameters

seed 12345
ntop 500
binThresh 0
verb FALSE
retRes FALSE
estNdbl TRUE
useAssay counts
packageVersion 1.10.0

In runCxds, the ntop parameter is the number of top variance genes to consider. The binThresh parameter is the minimum counts a gene needs to have to be included in the analysis. verb determines whether progress messages will be displayed or not. retRes will determine whether the gene pair results should be returned or not. The user may set the estimated number of doublets with estNdbl.

4.4.2 8628f96c

4.4.2.1 Cxds Doublet Assignment

4.4.2.2 Cxds Doublet Score

4.4.2.3 Density Score

4.4.2.4 Violin Score

4.4.2.5 Parameters

seed 12345
ntop 500
binThresh 0
verb FALSE
retRes FALSE
estNdbl TRUE
useAssay counts
packageVersion 1.10.0

In runCxds, the ntop parameter is the number of top variance genes to consider. The binThresh parameter is the minimum counts a gene needs to have to be included in the analysis. verb determines whether progress messages will be displayed or not. retRes will determine whether the gene pair results should be returned or not. The user may set the estimated number of doublets with estNdbl.

4.4.3 e7372715

4.4.3.1 Cxds Doublet Assignment

4.4.3.2 Cxds Doublet Score

4.4.3.3 Density Score

4.4.3.4 Violin Score

4.4.3.5 Parameters

seed 12345
ntop 500
binThresh 0
verb FALSE
retRes FALSE
estNdbl TRUE
useAssay counts
packageVersion 1.10.0

In runCxds, the ntop parameter is the number of top variance genes to consider. The binThresh parameter is the minimum counts a gene needs to have to be included in the analysis. verb determines whether progress messages will be displayed or not. retRes will determine whether the gene pair results should be returned or not. The user may set the estimated number of doublets with estNdbl.

4.4.4 84824920

4.4.4.1 Cxds Doublet Assignment

4.4.4.2 Cxds Doublet Score

4.4.4.3 Density Score

4.4.4.4 Violin Score

4.4.4.5 Parameters

seed 12345
ntop 500
binThresh 0
verb FALSE
retRes FALSE
estNdbl TRUE
useAssay counts
packageVersion 1.10.0

In runCxds, the ntop parameter is the number of top variance genes to consider. The binThresh parameter is the minimum counts a gene needs to have to be included in the analysis. verb determines whether progress messages will be displayed or not. retRes will determine whether the gene pair results should be returned or not. The user may set the estimated number of doublets with estNdbl.

4.4.5 c47a5959

4.4.5.1 Cxds Doublet Assignment

4.4.5.2 Cxds Doublet Score

4.4.5.3 Density Score

4.4.5.4 Violin Score

4.4.5.5 Parameters

seed 12345
ntop 500
binThresh 0
verb FALSE
retRes FALSE
estNdbl TRUE
useAssay counts
packageVersion 1.10.0

In runCxds, the ntop parameter is the number of top variance genes to consider. The binThresh parameter is the minimum counts a gene needs to have to be included in the analysis. verb determines whether progress messages will be displayed or not. retRes will determine whether the gene pair results should be returned or not. The user may set the estimated number of doublets with estNdbl.

4.5 Bcds

BCDS, or binary classification based doublet scoring, is an algorithm in the SCDS package which uses a binary classification approach to determine doublets. The wrapper function runBcds can be used to separately run the BCDS algorithm on its own. The wrapper function plotBCDSResults can be used to plot the QC outputs from the BCDS algorithm. The output of runBcds is scds_bcds_score, which is the likelihood that a cell is a doublet.

4.5.1 d8b737fb

4.5.1.1 Bcds Doublet Assignment

4.5.1.2 Bcds Doublet Score

4.5.1.3 Density Score

4.5.1.4 Violin Score

4.5.1.5 Parameters

seed 12345
ntop 500
srat 1
verb FALSE
retRes FALSE
nmax tune
varImp FALSE
estNdbl TRUE
useAssay counts
packageVersion 1.10.0

In runBcds, the ntop parameter is the number of top variance genes to consider. The srat parameter is the ratio between original number of cells and simulated doublets. The nmax parameter is the maximum number of cycles that the algorithm should run through. If set to tune, this will be automatic. The varImp parameter determines if the variable importance should be returned or not.

4.5.2 8628f96c

4.5.2.1 Bcds Doublet Assignment

4.5.2.2 Bcds Doublet Score

4.5.2.3 Density Score

4.5.2.4 Violin Score

4.5.2.5 Parameters

seed 12345
ntop 500
srat 1
verb FALSE
retRes FALSE
nmax tune
varImp FALSE
estNdbl TRUE
useAssay counts
packageVersion 1.10.0

In runBcds, the ntop parameter is the number of top variance genes to consider. The srat parameter is the ratio between original number of cells and simulated doublets. The nmax parameter is the maximum number of cycles that the algorithm should run through. If set to tune, this will be automatic. The varImp parameter determines if the variable importance should be returned or not.

4.5.3 e7372715

4.5.3.1 Bcds Doublet Assignment

4.5.3.2 Bcds Doublet Score

4.5.3.3 Density Score

4.5.3.4 Violin Score

4.5.3.5 Parameters

seed 12345
ntop 500
srat 1
verb FALSE
retRes FALSE
nmax tune
varImp FALSE
estNdbl TRUE
useAssay counts
packageVersion 1.10.0

In runBcds, the ntop parameter is the number of top variance genes to consider. The srat parameter is the ratio between original number of cells and simulated doublets. The nmax parameter is the maximum number of cycles that the algorithm should run through. If set to tune, this will be automatic. The varImp parameter determines if the variable importance should be returned or not.

4.5.4 84824920

4.5.4.1 Bcds Doublet Assignment

4.5.4.2 Bcds Doublet Score

4.5.4.3 Density Score

4.5.4.4 Violin Score

4.5.4.5 Parameters

seed 12345
ntop 500
srat 1
verb FALSE
retRes FALSE
nmax tune
varImp FALSE
estNdbl TRUE
useAssay counts
packageVersion 1.10.0

In runBcds, the ntop parameter is the number of top variance genes to consider. The srat parameter is the ratio between original number of cells and simulated doublets. The nmax parameter is the maximum number of cycles that the algorithm should run through. If set to tune, this will be automatic. The varImp parameter determines if the variable importance should be returned or not.

4.5.5 c47a5959

4.5.5.1 Bcds Doublet Assignment

4.5.5.2 Bcds Doublet Score

4.5.5.3 Density Score

4.5.5.4 Violin Score

4.5.5.5 Parameters

seed 12345
ntop 500
srat 1
verb FALSE
retRes FALSE
nmax tune
varImp FALSE
estNdbl TRUE
useAssay counts
packageVersion 1.10.0

In runBcds, the ntop parameter is the number of top variance genes to consider. The srat parameter is the ratio between original number of cells and simulated doublets. The nmax parameter is the maximum number of cycles that the algorithm should run through. If set to tune, this will be automatic. The varImp parameter determines if the variable importance should be returned or not.

4.6 ScdsHybrid

The CXDS-BCDS hybrid algorithm, uses both CXDS and BCDS algorithms from the SCDS package. The wrapper function runCxdsBcdsHybrid can be used to separately run the CXDS-BCDS hybrid algorithm on its own. The wrapper function plotScdsHybridResults can be used to plot the QC outputs from the CXDS-BCDS hybrid algorithm. The output of runCxdsBcdsHybrid is the doublet score, scds_hybrid_score.

4.6.1 d8b737fb

4.6.1.1 ScdsHybrid Doublet Assignment

4.6.1.2 ScdsHybrid Doublet Score

4.6.1.3 Density Score

4.6.1.4 Violin Score

4.6.1.5 Parameters

seed 12345
nTop 500
cxdsArgs NULL
bcdsArgs NULL
verb FALSE
estNdbl TRUE
force FALSE
useAssay counts
packageVersion 1.10.0

All parameters from the runBCDS and runBCDS functions may be applied to this function in the cxdsArgs and bcdsArgs parameters, respectively.

4.6.2 8628f96c

4.6.2.1 ScdsHybrid Doublet Assignment

4.6.2.2 ScdsHybrid Doublet Score

4.6.2.3 Density Score

4.6.2.4 Violin Score

4.6.2.5 Parameters

seed 12345
nTop 500
cxdsArgs NULL
bcdsArgs NULL
verb FALSE
estNdbl TRUE
force FALSE
useAssay counts
packageVersion 1.10.0

All parameters from the runBCDS and runBCDS functions may be applied to this function in the cxdsArgs and bcdsArgs parameters, respectively.

4.6.3 e7372715

4.6.3.1 ScdsHybrid Doublet Assignment

4.6.3.2 ScdsHybrid Doublet Score

4.6.3.3 Density Score

4.6.3.4 Violin Score

4.6.3.5 Parameters

seed 12345
nTop 500
cxdsArgs NULL
bcdsArgs NULL
verb FALSE
estNdbl TRUE
force FALSE
useAssay counts
packageVersion 1.10.0

All parameters from the runBCDS and runBCDS functions may be applied to this function in the cxdsArgs and bcdsArgs parameters, respectively.

4.6.4 84824920

4.6.4.1 ScdsHybrid Doublet Assignment

4.6.4.2 ScdsHybrid Doublet Score

4.6.4.3 Density Score

4.6.4.4 Violin Score

4.6.4.5 Parameters

seed 12345
nTop 500
cxdsArgs NULL
bcdsArgs NULL
verb FALSE
estNdbl TRUE
force FALSE
useAssay counts
packageVersion 1.10.0

All parameters from the runBCDS and runBCDS functions may be applied to this function in the cxdsArgs and bcdsArgs parameters, respectively.

4.6.5 c47a5959

4.6.5.1 ScdsHybrid Doublet Assignment

4.6.5.2 ScdsHybrid Doublet Score

4.6.5.3 Density Score

4.6.5.4 Violin Score

4.6.5.5 Parameters

seed 12345
nTop 500
cxdsArgs NULL
bcdsArgs NULL
verb FALSE
estNdbl TRUE
force FALSE
useAssay counts
packageVersion 1.10.0

All parameters from the runBCDS and runBCDS functions may be applied to this function in the cxdsArgs and bcdsArgs parameters, respectively.

5 Ambient RNA Detection

5.1 Ambient RNA Detection Summary

5.1.1 decontX

5.2 DecontX

In droplet-based single cell technologies, ambient RNA that may have been released from apoptotic or damaged cells may get incorporated into another droplet, and can lead to contamination. decontX, available from the celda, is a Bayesian method for the identification of the contamination level at a cellular level. The wrapper function runDecontX can be used to separately run the DecontX algorithm on its own. The wrapper function plotDecontXResults can be used to plot the QC outputs from the DecontX algorithm. The outputs of runDecontX are decontX_contamination and decontX_clusters. decontX_contamination is a numeric vector which characterizes the level of contamination in each cell. Clustering is performed as part of the runDecontX algorithm. decontX_clusters is the resulting cluster assignment, which can also be labeled on the plot.

5.2.1 d8b737fb

5.2.1.1 DecontX Contamination Score

5.2.1.2 DecontX Clusters

5.2.1.3 Density Score

5.2.1.4 Violin Score

5.2.1.5 Parameters

z NULL
maxIter 500
delta 10 10
estimateDelta TRUE
convergence 0.001
varGenes 5000
dbscanEps 1
logfile NULL
verbose TRUE
packageVersion 1.12.0

5.2.2 8628f96c

5.2.2.1 DecontX Contamination Score

5.2.2.2 DecontX Clusters

5.2.2.3 Density Score

5.2.2.4 Violin Score

5.2.2.5 Parameters

z NULL
maxIter 500
delta 10 10
estimateDelta TRUE
convergence 0.001
varGenes 5000
dbscanEps 1
logfile NULL
verbose TRUE
packageVersion 1.12.0

5.2.3 e7372715

5.2.3.1 DecontX Contamination Score

5.2.3.2 DecontX Clusters

5.2.3.3 Density Score

5.2.3.4 Violin Score

5.2.3.5 Parameters

z NULL
maxIter 500
delta 10 10
estimateDelta TRUE
convergence 0.001
varGenes 5000
dbscanEps 1
logfile NULL
verbose TRUE
packageVersion 1.12.0

5.2.4 84824920

5.2.4.1 DecontX Contamination Score

5.2.4.2 DecontX Clusters

5.2.4.3 Density Score

5.2.4.4 Violin Score

5.2.4.5 Parameters

z NULL
maxIter 500
delta 10 10
estimateDelta TRUE
convergence 0.001
varGenes 5000
dbscanEps 1
logfile NULL
verbose TRUE
packageVersion 1.12.0

5.2.5 c47a5959

5.2.5.1 DecontX Contamination Score

5.2.5.2 DecontX Clusters

5.2.5.3 Density Score

5.2.5.4 Violin Score

5.2.5.5 Parameters

z NULL
maxIter 500
delta 10 10
estimateDelta TRUE
convergence 0.001
varGenes 5000
dbscanEps 1
logfile NULL
verbose TRUE
packageVersion 1.12.0


5.3 SoupX

In droplet-based single cell technologies, ambient RNA that may have been released from apoptotic or damaged cells may get incorporated into another droplet, and can lead to contamination. SoupX uses non-expressed genes to estimates a global contamination fraction.The wrapper function runSoupX can be used to separately run the SoupX algorithm on its own. The main outputs of runSoupX are soupX_contamination, soupX_clusters, and the corrected assay SoupX, together with other intermediate metrics that SoupX generates.soupX_contamination is a numeric vector which characterizes the level of contamination in each cell. SoupX generates one global contamination estimate per sample, instead of returning cell-specific estimation.Clustering is required for SoupX algorithm. It will be performed if users do not provide the label as input. quickCluster() method from package scran is adopted for this purpose. soupX_clusters is the resulting cluster assignment, which can also be labeled on the plot. The wrapper function plotSoupXResult can be used to plot the QC outputs from the SoupX algorithm. Plots includes a UMAP with clustering labels and a number of UMAPs colored with the soup fraction of top marker genes which are identified for contamination estimation.

5.3.1 d8b737fb

5.3.1.1 SoupX Clustering

5.3.1.2 Soup Fractions

5.3.1.2.1 ENSG00000127329

5.3.1.2.2 ENSG00000186335

5.3.1.2.3 ENSG00000169347

5.3.1.2.4 ENSG00000242029

5.3.1.2.5 ENSG00000251504

5.3.1.3 Parameters

useAssay counts
bgAssayName NULL
assayName SoupX
tfidfMin 1
soupQuantile 0.9
maxMarkers 100
contaminationRange 0.01 0.8
rhoMaxFDR 0.2
priorRho 0.05
priorRhoStdDev 0.1
forceAccept FALSE
adjustMethod subtraction
roundToInt FALSE
tol 0.001
pCut 0.01
reducedDimName SoupX_UMAP_d8b737fb
sessionInfo x86_64-pc-linux-gnu
cluster soupX_clusters

5.3.2 8628f96c

5.3.2.1 SoupX Clustering

5.3.2.2 Soup Fractions

5.3.2.2.1 ENSG00000070915

5.3.2.2.2 ENSG00000074803

5.3.2.2.3 ENSG00000109684

5.3.2.2.4 ENSG00000183580

5.3.2.2.5 ENSG00000251504

5.3.2.3 Parameters

useAssay counts
bgAssayName NULL
assayName SoupX
tfidfMin 1
soupQuantile 0.9
maxMarkers 100
contaminationRange 0.01 0.8
rhoMaxFDR 0.2
priorRho 0.05
priorRhoStdDev 0.1
forceAccept FALSE
adjustMethod subtraction
roundToInt FALSE
tol 0.001
pCut 0.01
reducedDimName SoupX_UMAP_8628f96c
sessionInfo x86_64-pc-linux-gnu
cluster soupX_clusters

5.3.3 e7372715

5.3.3.1 SoupX Clustering

5.3.3.2 Soup Fractions

5.3.3.2.1 ENSG00000143882

5.3.3.2.2 ENSG00000169344

5.3.3.2.3 ENSG00000127329

5.3.3.2.4 ENSG00000186335

5.3.3.2.5 ENSG00000070915

5.3.3.3 Parameters

useAssay counts
bgAssayName NULL
assayName SoupX
tfidfMin 1
soupQuantile 0.9
maxMarkers 100
contaminationRange 0.01 0.8
rhoMaxFDR 0.2
priorRho 0.05
priorRhoStdDev 0.1
forceAccept FALSE
adjustMethod subtraction
roundToInt FALSE
tol 0.001
pCut 0.01
reducedDimName SoupX_UMAP_e7372715
sessionInfo x86_64-pc-linux-gnu
cluster soupX_clusters

5.3.4 84824920

5.3.4.1 SoupX Clustering

5.3.4.2 Soup Fractions

5.3.4.2.1 ENSG00000070915

5.3.4.2.2 ENSG00000127329

5.3.4.2.3 ENSG00000148942

5.3.4.2.4 ENSG00000183287

5.3.4.2.5 ENSG00000119121

5.3.4.3 Parameters

useAssay counts
bgAssayName NULL
assayName SoupX
tfidfMin 1
soupQuantile 0.9
maxMarkers 100
contaminationRange 0.01 0.8
rhoMaxFDR 0.2
priorRho 0.05
priorRhoStdDev 0.1
forceAccept FALSE
adjustMethod subtraction
roundToInt FALSE
tol 0.001
pCut 0.01
reducedDimName SoupX_UMAP_84824920
sessionInfo x86_64-pc-linux-gnu
cluster soupX_clusters

5.3.5 c47a5959

5.3.5.1 SoupX Clustering

5.3.5.2 Soup Fractions

5.3.5.2.1 ENSG00000242029

5.3.5.2.2 ENSG00000074803

5.3.5.2.3 ENSG00000250799

5.3.5.2.4 ENSG00000251504

5.3.5.2.5 ENSG00000143882

5.3.5.3 Parameters

useAssay counts
bgAssayName NULL
assayName SoupX
tfidfMin 1
soupQuantile 0.9
maxMarkers 100
contaminationRange 0.01 0.8
rhoMaxFDR 0.2
priorRho 0.05
priorRhoStdDev 0.1
forceAccept FALSE
adjustMethod subtraction
roundToInt FALSE
tol 0.001
pCut 0.01
reducedDimName SoupX_UMAP_c47a5959
sessionInfo x86_64-pc-linux-gnu
cluster soupX_clusters


6 Session Information

Session Information
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: CentOS Linux 7 (Core)
## 
## Matrix products: default
## BLAS:   /share/pkg.7/r/4.1.1_noblas/install/lib64/R/lib/libRblas.so
## LAPACK: /share/pkg.7/r/4.1.1_noblas/install/lib64/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
##  [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
## [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] TENxPBMCData_1.12.0         HDF5Array_1.22.1            rhdf5_2.38.1               
##  [4] cowplot_1.1.1               dplyr_1.0.10                ggplot2_3.3.6              
##  [7] singleCellTK_2.7.1          DelayedArray_0.20.0         Matrix_1.5-0               
## [10] SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0 Biobase_2.54.0             
## [13] GenomicRanges_1.46.1        GenomeInfoDb_1.30.1         IRanges_2.28.0             
## [16] S4Vectors_0.32.4            BiocGenerics_0.40.0         MatrixGenerics_1.6.0       
## [19] matrixStats_0.62.0         
## 
## loaded via a namespace (and not attached):
##   [1] AnnotationHub_3.2.2           BiocFileCache_2.2.1           systemfonts_1.0.4            
##   [4] BiocParallel_1.28.3           scater_1.22.0                 digest_0.6.29                
##   [7] htmltools_0.5.3               viridis_0.6.2                 fansi_1.0.3                  
##  [10] magrittr_2.0.3                memoise_2.0.1                 ScaledMatrix_1.2.0           
##  [13] GSVAdata_1.30.0               limma_3.50.3                  Biostrings_2.62.0            
##  [16] R.utils_2.12.0                svglite_2.1.0                 colorspace_2.0-3             
##  [19] blob_1.2.3                    rvest_1.0.2                   rappdirs_0.3.3               
##  [22] ggrepel_0.9.1                 xfun_0.32                     crayon_1.5.1                 
##  [25] RCurl_1.98-1.8                jsonlite_1.8.0                glue_1.6.2                   
##  [28] kableExtra_1.3.4              gtable_0.3.1                  zlibbioc_1.40.0              
##  [31] XVector_0.34.0                webshot_0.5.2                 BiocSingular_1.10.0          
##  [34] DropletUtils_1.14.2           Rhdf5lib_1.16.0               fishpond_2.0.1               
##  [37] scales_1.2.1                  DBI_1.1.3                     edgeR_3.36.0                 
##  [40] Rcpp_1.0.9                    viridisLite_0.4.1             xtable_1.8-6                 
##  [43] reticulate_1.26               dqrng_0.3.0                   bit_4.0.4                    
##  [46] rsvd_1.0.5                    httr_1.4.4                    FNN_1.1.3.1                  
##  [49] ellipsis_0.3.2                pkgconfig_2.0.3               R.methodsS3_1.8.2            
##  [52] farver_2.1.1                  scuttle_1.4.0                 sass_0.4.2                   
##  [55] uwot_0.1.14                   dbplyr_2.2.1                  locfit_1.5-9.6               
##  [58] utf8_1.2.2                    tidyselect_1.1.2              labeling_0.4.2               
##  [61] rlang_1.0.5                   later_1.3.0                   AnnotationDbi_1.56.2         
##  [64] munsell_0.5.0                 BiocVersion_3.14.0            tools_4.1.1                  
##  [67] cachem_1.0.6                  cli_3.4.0                     generics_0.1.3               
##  [70] RSQLite_2.2.17                ExperimentHub_2.2.1           evaluate_0.16                
##  [73] stringr_1.4.1                 fastmap_1.1.0                 yaml_2.3.5                   
##  [76] knitr_1.40                    bit64_4.0.5                   purrr_0.3.4                  
##  [79] KEGGREST_1.34.0               sparseMatrixStats_1.6.0       mime_0.12                    
##  [82] R.oo_1.25.0                   xml2_1.3.3                    compiler_4.1.1               
##  [85] rstudioapi_0.14               beeswarm_0.4.0                filelock_1.0.2               
##  [88] curl_4.3.2                    png_0.1-7                     interactiveDisplayBase_1.32.0
##  [91] tibble_3.1.8                  bslib_0.4.0                   stringi_1.7.8                
##  [94] highr_0.9                     lattice_0.20-45               vctrs_0.4.1                  
##  [97] pillar_1.8.1                  lifecycle_1.0.2               rhdf5filters_1.6.0           
## [100] BiocManager_1.30.18           jquerylib_0.1.4               RcppAnnoy_0.0.19             
## [103] BiocNeighbors_1.12.0          bitops_1.0-7                  irlba_2.3.5                  
## [106] httpuv_1.6.6                  R6_2.5.1                      promises_1.2.0.1             
## [109] gridExtra_2.3                 vipor_0.4.5                   codetools_0.2-18             
## [112] gtools_3.9.3                  assertthat_0.2.1              withr_2.5.0                  
## [115] GenomeInfoDbData_1.2.7        parallel_4.1.1                grid_4.1.1                   
## [118] beachmat_2.10.0               rmarkdown_2.16                DelayedMatrixStats_1.16.0    
## [121] shiny_1.7.2                   ggbeeswarm_0.6.0